Transporting observational study results to a target population of interest using inverse odds of participation weighting

Inverse odds of participation weighting (IOPW) has been proposed to transport clinical trial findings to target populations of interest when the distribution of treatment effect modifiers differs between trial and target populations. We set out to apply IOPW to transport results from an observational study to a target population of interest. We demonstrated the feasibility of this idea with a real-world example using a nationwide electronic health record derived de-identified database from Flatiron Health. First, we conducted an observational study that carefully adjusted for confounding to estimate the treatment effect of fulvestrant plus palbociclib relative to letrozole plus palbociclib as a second-line therapy among estrogen receptor (ER)-positive, human epidermal growth factor receptor (HER2)-negative metastatic breast cancer patients. Second, we transported these findings to the broader cohort of patients who were eligible for a first-line therapy. The interpretation of the findings and validity of such studies, however, rely on the extent that causal inference assumptions are met.


Introduction
Real-world evidence (RWE) is playing an increasingly important role in clinical decision making, especially when clinical trial data are not available or when trial samples do not represent target populations of interest [1,2]. Compared to clinical trials, real-world data (RWD) is less costly to collect, can come from a variety of sources, and usually has larger size [3]. Despite their shortcomings such as data quality issues, RWD can be used for a variety of tasks such as target-drug combination discovery, drug-repurposing, or pragmatic trials [3,4]. The primary way of generating RWE from RWD is observational studies [5][6][7][8][9]. Such studies complement clinical trials in providing evidence for medical practice and regulatory approval of drugs and devices [1,2,4,10].
Propensity score (PS) methods have been developed to address various sources of bias in observational studies. Specifically, inverse probability of treatment weighting (IPW) is one of the fundamental methods to address confounding bias resulted from nonrandom treatment assignment [11]. It can also be used to handle missing data [12]. More recently, IPW has emerged as another popular means to standardize clinical trial results, i.e. to correct for the bias of non-representativeness of trial population compared to target population [5,[13][14][15][16][17][18]. When the trial is a subset of target population, this class of studies are referred to as generalizability studies and inverse probability of participation weighting is applied. In case when target population does not overlap with the original trial study population, inverse odds of participation weighting (IOPW) can be used for standardization [19,20]. This class of studies are referred to as transportability studies. In addition to standardizing clinical trial results, when applied with rigor, generalizability or transportability methods can be a valuable tool to estimate treatment effects in the target population using observational studies results. However, few studies have done so [21][22][23].
Motivated by the use case where we are interested in comparing two existing cancer treatments approved for use in different lines of therapy, we set out to apply transportability method to estimate treatment effect in the target population based on results from an observational study. Using a rich observational dataset that contains both per-indication and off-thelabel mediation use, our first objective is to compare the efficacy of the two treatments headto-head using inverse probability of treatment weighting (IPTW) to adjust for confounding in second-line patients. Next, we are interested in using IOPW to adjust for non-representativeness and standardize such results to the population of first-line patients. We will apply PS weighting twice in a row to correct for both nonrandom assignment and non-representativeness of the study sample. We propose this two-step approach as a way of generating RWE especially when few relevant trial results are available. Our study not only explores the opportunity of applying transportability methods to observational study results, it also adds to the body of literature where multi-stage weighting was carried out to account for more than one sources of bias (confounding, nonresponse, non-representativeness, etc.) [21,[23][24][25].
Specifically, the two treatments of interest in our study are fulvestrant plus palbociclib and letrozole plus palbociclib. Although they have both been approved for estrogen receptor (ER)positive, human epidermal growth factor receptor (HER2)-negative metastatic breast cancer (MBC) patients, they were intended for different lines of therapy [26,27]. Fulvestrant has been found to be associated with longer overall survival in women of any menopausal status who had progressed on prior endocrine therapy, when used in combination with palbociclib compared to fulvestrant plus placebo [26]. Letrozole plus palbociclib has been found in another trial to result in longer progression free survival than letrozole alone among postmenopausal women with no prior treatment for their advanced disease [27]. Interestingly, an observational study has found that both treatments have been used as first, second, third and beyond line of therapy [28]. Using RWD from Flatiron Health database, we will first conduct a comparative effectiveness study in the second-line patients with IPTW. The treatment effect obtained will then be standardized to first-line patients using IOPW.

Data source
This study used the nationwide Flatiron Health electronic health record (EHR) derived deidentified database. The Flatiron Health database is a longitudinal database, comprising deidentified patient-level structured and unstructured data, curated via technology-enabled abstraction [29,30]. The majority of patients in the database originated from community oncology settings; relative community to academic proportions may vary depending on the study cohort. During the study period, the de-identified data originated from approximately 280 US cancer clinics (~800 sites of care). The data are de-identified and subject to obligations to prevent re-identification and protect patient confidentiality.
Our study used de-identified data from 8,356 ER (+), HER2 (-) MBC patients diagnosed from January 1, 2014, to June 30, 2020, who had at least one line of treatments. Specifically, this general cohort was selected based on the following criteria: • ICD diagnosis of breast cancer (ICD-9 174.x or 175.x or ICD-10 C50x) • Evidence of Stage IV or recurrent MBC with a metastatic diagnosis date on after January 1, 2014 • At least 2 documented clinical visits on or after January 1, 2014 • Evidence of treatment with at least one line of therapy for metastatic disease • Evidence of ER (+) defined as having ER (+) or PR (+) test before or up to 60 days after the start of first-line treatment.
• Evidence of HER2 (-) defined by having a HER2 negative test (IH negative (0-1+), FISH negative/not amplified, or negative NOS) and the absence of a positive test (IHC positive (3 +), Fish positive/amplified, positive NOS) before or up to 60 days after of the start date of first-line treatment.
• No greater than 90 days gap between metastasis diagnosis date and first structured activity (vital information, medication administration, a non-cancelled drug order, or a reported laboratory test/result) after MBC diagnosis date (to identify patients who are likely to be missing treatment) Real world progression was defined as any event with EHR documentation of disease worsening, based on clinicians' reporting [31].

Original (second-line) cohort definition.
The cohort of second-line patients were defined based on the following inclusion/exclusion criteria: 1) all female adult patients (at time of second-line therapy initiation) who were treated with fulvestrant-palbociclib or letrozolepalbociclib no later than March 30, 2020 inclusive to allow for at least 90 days of follow-up time 2) excluding patients who were treated with CDK inhibitor (palbociclib, ribociclib, abemaciclib), mTOR inhibitors (everolimus), PI3K inhibitors (alpelisib), and any clinical study drug during first-line as a single drug or in combination therapy. Patients were either treated with fulvestrant-palbociclib or letrozole-palbociclib as a second-line therapy. Fourteen days after the start of second-line therapy was used as the index date for all patients in the cohort since an immediate effect of the treatment on the outcome was not expected. Patients who progressed or died before the index date were removed.

Confounders.
The following covariates were extracted from the database: age at initiation of second-line therapy, race, stage at initial breast cancer diagnosis, Eastern Cooperative Oncology Group (ECOG) score within 30 days of second-line therapy start date, time from initial breast cancer diagnosis to MBC diagnosis, and medical practice type. Additionally, the number of metastatic sites recorded, visceral disease (whether metastatic disease is in the lung and/or liver), and bone-only metastasis were also determined before or up to 14 days after the initiation of second-line therapy. These covariates were used as confounders among second-line patients based on data availability and clinical significance. Later in Section 2.3.2, the same covariates were used as effect modifiers for the transportability portion of our study.

Endpoint.
Real-world progression free survival (rwPFS) was used as our primary outcome in the study [32]. An event was defined as the first documented progression or death, whichever came earlier, that happened no earlier than 14 days after the index date. Date of death was set to be the 15 th of each month since death data were only available on a monthly granularity. All patients were censored at the time of last clinical note, or 3 years after the index date, whichever came earlier.

Statistical analysis.
IPTW was used to adjust for confounding by indication (Fig 1). Let A denote treatment, where A i = 1 indicates fulvestrant-palbociclib and A i = 0 letrozole-palbociclib. Let C denote the vector of confounders. The conditional probability of receiving fulvestrant-palbociclib, P, was estimated using a logistic regression model. Specifically, P i = P(A i = 1|C i ), i = 1,2,3. . .n 1 , where n 1 denote the number of patients in the second-line cohort. Stabilized weights ( Table 1) were estimated for each patient and used to adjust for imbalance of baseline characteristics between two treatment arms [11]. The weights were entered into Cox proportional hazards model with a robust variance estimator for adjustment. Proportional

Weights
Step 1

Transporting findings from second-to the first-line patient cohort
When the treatment effect differs by the levels of other factors, there exists heterogeneity in the causal effect of treatment. We refer to the set of variables as effect modifiers. If at the same time, these effect modifiers also impact the selection of trial participants from the target population of interest, the treatment effect estimated from trial does not represent that in the entire target population [5,14,15]. Post-trial statistical methods can be applied to mitigate this issue [5,15,19,[36][37][38]. Instead of modeling interaction terms directly, trial results can be weighted so the distribution of effect modifiers in the trial resembles that of the target population [5,15,19,[36][37][38]. The weights were created from PS, the conditional probability of trial participation [5,15,19,[36][37][38]. We refer to the scenario where the trial is a subset of the target population as generalizability. In contrast, if the trial and target population are disjoint, this is a transportability scenario. For example, one trial can be conducted in a patient population when they were diagnosed with a certain condition between 2005-2006, but investigators are interested in transporting the trial results to the current patient population who were newly diagnosed with the condition in 2021. We will use IOPW to transport the results from Section 2.2 to a first-line patient cohort. Similarly defined covariates as in Section 2.2.2 were considered as effect modifiers among the first-line cohort: age at initiation of first-line therapy, race, stage at initial breast cancer diagnosis, ECOG score within 30 days of first-line therapy start date, time from initial breast cancer diagnosis to MBC diagnosis, and medical practice type. Additionally, the number of metastatic sites recorded, visceral disease (whether metastatic disease is in the lung and/or liver), and bone-only metastasis were also determined before or up to 14 days after the initiation of first-line therapy.

Endpoint.
Outcome variable was not needed for this analysis.

Statistical analysis.
First, the extent to which the distribution of potential effect modifiers differ in the two cohorts was assessed using standardized mean differences (SMD) [11]. Next, data on effect modifiers from the original and target cohorts were concatenated together. An indicator variable denotes whether an individual belongs to the original (S i = 1) or the target (S i = 0) cohort. Logistic regression was used to estimate the conditional probability of being in the target cohort, Q, given all effect modifiers E. More formally, Q i = P(S i = 1| E i ), i = 1,2,3. . .n 1 +n 2 where n 2 denotes the number of patients in the first-line cohort.
To assess the similarity of the cohorts based on the distribution of Q, Tipton index was computed [38]. Tipton index was defined as R ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi f ðp 1 Þf ðp 2 Þ p dp, where p 1 and p 2 denote vectors of propensity scores estimated in the original and target cohorts respectively [38]. Tipton index is unit-less similarity metric and ranges from 0 to 1. A higher level of Tipton index indicates two populations are highly similar (above 0.8) [38]. That means the application of transportability methods will involve less extrapolation, and thus invoke higher confidence of the results [38].
Next, IOPW was implemented to adjust for non-representative of the original cohort as compared to the target population of first-line patients (Fig 1). The weights for adjusting for differential distribution of effect modifiers were estimated as below, where the weights from Section 2.2.4 were multiplied by the odds of being in the original cohort [19]. To transport the treatment effect from Section 2.2 to the new target population, Cox proportional hazards model with a robust variance estimator was applied to individuals in the original cohorts only, adjusting for weights V that was estimated using two cohorts combined, including no covariates in the model ( Table 1). Note that neither treatment nor outcome variable from the new target population was used in the analysis. Only potential effect modifiers were used to estimate the weights for the Cox model. Similarly to the IPTW analysis described in Section 2.2.4, MI-derPassive INT-within was used to impute missing data [33,34]. Specifically, covariates from both original and target cohorts, treatment and outcome variables from the original cohort were included in the imputation model [35]. Last, SMD was again used to quantify the difference of two populations after weighting. All statistical analyses were conducted in R version 4.0.3 [40]. R code used for modeling can be found at https://github.com/yling2019/ SERD_transportability.

Results
There were 752 patients in the original second-line cohort, 397 of which were treated with fulvestrant-palbociclib and 355 with letrozole-palbociclib (Table 2). Compared to patients in letrozole-palbociclib arm, patients treated with fulvestrant-palbociclib were older with a larger proportion of White patients, and ECOG score of 0, who were initially diagnosed with Stage 0-3 ( Table 2). The gap between initial breast cancer diagnosis and MBC diagnosis tended to be more than 1 years for patients treated with fulvestrant-palbociclib. The median follow-up time for fulvestrant-palbociclib and letrozole-palbociclib arms were 212.0 days (mean = 312.6, standard deviation (SD) = 288.0) and 260.0 days (mean = 382.6, SD = 335.2) respectively. There were a total of 528 events observed, 280 of which were in fulvestrant-palbociclib arm and 248 in letrozole-palbociclib arm. After IPTW, SMD for all covariates were below 0.1 for all m = 36 imputed datasets. For each imputed dataset, proportional hazards assumptions were met by testing Schoenfeld residuals (median p-value 0.9517, min 0.7955, max 0.9974). The IPTW adjusted hazard ratio (HR) was 1.11 with 95% confidence interval [0.93, 1.32]. This is consistent with observations from the Kaplan-Meier curve (Fig 2A), where the two curves are very close to each other, even though the probability of survival was slightly higher in the letrozole-palbociclib arm.
The target population consisted of 3,109 patients who were eligible to receive first-line treatment of their MBC ( Table 3). Compared to the original second-line cohort, this target first-line patient population was younger ( Table 3). They also had a larger Black representation, more people diagnosed with de novo MBC, less metastatic sites, and a smaller gap between initial breast cancer diagnosis and MBC diagnosis ( Table 3). Mean Tipton index among all m = 25 multiply imputed datasets was 0.81 (SD = 0.01) before weighting, indicating that the two populations are similar enough to carry out a transportability study [16]. After IOPW, SMD for the covariates were below 0.1 for all imputed datasets. Even though SMD of ECOG scores for some imputed datasets were above 0.1, the mean SMD was 0.10 with SD 0.01. For each imputed dataset, proportional hazards assumptions were met by testing Schoenfeld residuals (median p-value 0.8947, min 0.4424, max 0.9988). The IOPW adjusted hazard ratio was 1.12 with 95% confidence interval [0.90, 1.39]. This result is also consistent with Kaplan-Meier curve (Fig 2B), where both median survival times and the overall curves are very similar to each other.

Discussion
Initially using IPTW to address confounding, we have estimated the HR of fulvestrant-palbociclib compared to letrozole-palbociclib in second-line patients to be 1.11 with 95% confidence  interval [0.93, 1.32]. Next, we have additionally applied IOPW to transport these results to a first-line patient population and obtained a HR of 1.12 with 95% confidence interval [0.90, 1.39]. For both patient cohorts, the small beneficial treatment effect of letrozole-palbociclib was not statistically significant. The similarity of the two hazard ratios indicates little treatment effect modification from selected demographic and clinical variables in our study.  Our study was aimed to generate RWE using the rich dataset from Flatiron Health. However, we are aware of the level of such evidence, given our results were transported from a cohort study. Thus, our results need to be considered in conjunction with other types of evidence in the literature. One future direction is to conduct an observational study directly in first-line patient cohort, comparing the two treatments of interest using off-the-label medication use. There has been only one clinical trial comparing fulvestrant-palbociclib vs letrozolepalbociclib as a first-line therapy [41]. In the trial, 486 MBC women with no prior treatment for their metastatic disease were randomly assigned with 1:1 ratio to either treatment arms with a primary end point investigator-assessed progression free survival (PFS). The HR was 1.13 with 95% 0.89-1.45 and similar result was also observed when stratified by type of metastatic disease and visceral involvement. Although there are differences between this trial and our study (e.g. study design, study end point), neither found fulvestrant-palbociclib to have a different treatment effect from letrozole-palbociclib.
There are many limitations to our study. First and foremost, transportability studies usually assume the outcome generating functions of the two populations to be the same. In our realworld example, this means that we assumed progression or death happened through a similar pathway in the second-and first-line cohorts after treatment with fulvestrant-palbociclib relative to treatment with letrozole-palbociclib. While impossible to test, we can assess the plausibility of this assumption based on biological and epidemiological considerations. For example, consider the fact that all second-line patients have already been exposed to a first-line treatment. This drug exposure history may affect the outcome directly, or more importantly have a synergistic effect with the treatment of interest that alters its effect. If this were the case, the treatment effect would be impossible to estimate simply because we do not know how firstline therapy patients will be treated in the future, nor do we have the actual data to adjust for prior history of treatment. In addition, treatments in neoadjuvant or adjuvant setting could also be part of a patient's drug exposure history, which was not captured in our study. Thus, we must be willing to assume that the effect of such treatments would have a negligible impact on the treatment effect. Additionally, while our data example illustrated a transportability scenario (where trial sample is disjoint from target population) [19], our original and target populations are not independent from each other. All patients in the second-line cohort were also eligible to be included in the first-line cohort, although the covariate space differs depending on the respective cohort they are in. While these two datasets were related, we believe that the concept of transportability is still appropriate here. This is because we believe our first-line cohort is still more representative of a typical first-line cohort in the real-world. We also acknowledge that the validity of our findings in the first-line cohort is dependent on the internal validity of results in the first-line cohort, which isn't as strong as a clinical trial. However, no such trial has been conducted and therefore we have conducted this study to generate RWE. Lastly, we are limited by our dataset and may not capture all potential confounder and effect modifiers in our analyses.
We have demonstrated the feasibility of transporting results from a cohort study as a way to generate RWE when limited clinical evidence is available. Investigators need to exercise caution in interpretation of the findings that result from such applications. Specifically, assumptions underlying validity of such methods needs to be met and the challenges of achieving this may be higher when transporting from observational findings than randomized clinical trials. As more well designed observational studies have been conducted to emulate clinical trials [13,42,43], such application will further motivate methodologists and applied researchers to conduct more high-quality observational studies and make better use of such studies to provide RWE [44].